Scaling Instruction-Finetuned Language Models

In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data.

PaLM, T5, U-PaLM

We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B

👉Flan T5

Figure 3

instruction with examples / without examples

without chain-of-thought / with chain-of-thought

Finetuned Language Models Are Zero-Shot Learnersのあとに位置づく（1. Introductionに参照があった）